Optimal noise level for coding with tightly balanced networks of spiking neurons in the presence of transmission delays

Neural circuits consist of many noisy, slow components, with individual neurons subject to ion channel noise, axonal propagation delays, and unreliable and slow synaptic transmission. This raises a fundamental question: how can reliable computation emerge from such unreliable components? A classic strategy is to simply average over a population of N weakly-coupled neurons to achieve errors that scale as 1/N. But more interestingly, recent work has introduced networks of leaky integrate-and-fire (LIF) neurons that achieve coding errors that scale superclassically as 1/N by combining the principles of predictive coding and fast and tight inhibitory-excitatory balance. However, spike transmission delays preclude such fast inhibition, and computational studies have observed that such delays can cause pathological synchronization that in turn destroys superclassical coding performance. Intriguingly, it has also been observed in simulations that noise can actually improve coding performance, and that there exists some optimal level of noise that minimizes coding error. However, we lack a quantitative theory that describes this fascinating interplay between delays, noise and neural coding performance in spiking networks. In this work, we elucidate the mechanisms underpinning this beneficial role of noise by deriving analytical expressions for coding error as a function of spike propagation delay and noise levels in predictive coding tight-balance networks of LIF neurons. Furthermore, we compute the minimal coding error and the associated optimal noise level, finding that they grow as power-laws with the delay. Our analysis reveals quantitatively how optimal levels of noise can rescue neural coding performance in spiking neural networks with delays by preventing the build up of pathological synchrony without overwhelming the overall spiking dynamics. This analysis can serve as a foundation for the further study of precise computation in the presence of noise and delays in efficient spiking neural circuits.


A.1 Models
We consider a network of N leaky integrate-and-fire neurons whose membrane potentials undergo the dynamics , and neuron i emits a spike when V i > 1 2 . (S1) V i is the i th neuron's membrane potential, "˙" indicates a time derivative, λ V controls the strength of the leak, N is the input current, o j (t) is the j th neuron's spike train (represented as a sum of Dirac δ functions), ∆ = δ N is the spike propagation delay, σ controls the noise level, η i are independent Wiener processes, and the firing threshold is 1 2 . Note that neurons self-reset instantaneously to V i = − 1 2 . Note that the power of the time scale τ is chosen to ensure that each term in Eq. (S1) has the same units as membrane voltage, which here we take to be dimensionless. To see this note that a δ-function has units of inverse time (so multiplying any spike train o i (t), which is composed of δ-functions, by τ makes these terms dimensionless) and note that the Wiener process η i (t) obeys η i (t)η j (t ) t = δ ij δ(t − t ), implying that η i (t) itself has units of inverse square root of time. Thus we multiply the last term by √ τ to ensure that the standard deviation σ has the same units as the dimensionless membrane voltage. Finally, note the leak term λ V is also dimensionless.
Our goal is to derive an expression that describes the readout error, which we define as the standard deviation σ readout of the readoutx(t), as a function of the number of neurons N , the delay δ, and the noise level σ.
We derive our central result-an approximate upper-bound for σ readout in the δ τ limit-in Subsection A.4. But before tackling the full complexity of this system, we study a simpler system that exhibits the same noise tradeoff in the presence of delay: a tight-balance network of soft-threshold neurons. For this soft-threshold dynamics, we derive an exact expression for σ readout in the large N , small delay δ τ limit. The neurons undergo the dynamics (S5) To understand the behavior of this model, consider a simple situation where the membrane voltages V i of all the neurons are the same and equal to 0, as would be the case when there is no external input, and if there were an additional leak term to ensure all membrane potentials decay to 0. Then suppose the external input equal to N is turned on at time t = 0. All the membrane potentials will rise linearly at a rate N/τ and become superthreshold when they reach V i = 1 2 together at the same time τ /2N . The entire population will remain superthreshold until the first neuron in the network spikes. Below, we will denote the time interval from the simultaneous superthreshold crossing time to the time of first spike by the random variable t first . At this time t first , the membrane potential of the first neuron to spike will be immediately reset (i.e. decremented by 1). The rest of the network however, will first receive inhibition from this initial spike only at time t first + ∆, due to the transmission delay ∆. Of course during the entire time all membrane potentials are still increasing at a rate N/τ due to the external input. Thus, as long as the time t first + ∆ is less than τ /N , the entire population will be subthreshold again after responding to delayed inhibition from the first spike. More generally, we refer to the interval of time between the entire population crossing above threshold and then returning to subthreshold as a superthreshold interval. The smallest this superthreshold interval can be is the t first + ∆, assuming the delayed inhibition from a single spike is sufficient to return the entire population to subthreshold state. Of course it could be longer if delayed inhibition from multiple spikes is required to return the entire population to a subthreshold state. As we see below, we will be working with short delays ∆ and high enough superthreshold firing rates ρ so that typically a single spike suffices.
To find interesting, high performing operating regimes of this model, we consider the statistics of spiking during a superthreshold interval. First, for any individual superthreshold neuron, the probability of emitting a spike in a short time interval of duration dt is ρ × dt, and the time to the next spike of that neuron is exponentially distributed with mean 1/ρ. Thus the mean inter-spike interval (ISI) of a single super-threshold neuron is 1/ρ. However, if all N neurons cross threshold simultaneously, at some reference time t = 0, then the time t first to the first spike in the entire network is exponentially distributed with mean 1 N ρ . Now if any individual neuron spikes, it can potentially immediately return to a subthreshold state through its own reset mechanism that decrements the membrane voltage by 1. However we will be operating in a large N regime with a small number of neurons spiking between the first spike at t first and the first network wide inhibition due to that spike at time t first + ∆. So for large N we can neglect the small number of neurons that may have returned to a subthreshold state due to their own spiking during this interval [t first . . . t first + ∆], and simply assume all N neurons are superthreshold during this interval. Thus the superthreshold network ISI, defined to be the mean interval between any successive pair of spikes occurring anywhere in the network, is 1 N ρ . So during any part of the superthreshold interval of duration equal to the delay ∆, the expected number of spikes to occur is a Poisson random variable with mean λ = N ρ∆. In order to ensure in the large N limit that a large number of spikes do not impact the readout before the first spike has a chance to inhibit the network, we therefore choose ∆ = δ/N where δ is O(1). This ensures that λ = ρδ is O(1), and so O(1) spikes occur between the time the first spike occurs at t first , and the time that spike first has a chance to inhibit the network at time t first + ∆. Also, with this scaling of the delay ∆, the shortest possible superthreshold interval, on average is simply the sum of the mean of the time to first spike t first (which is 1 N ρ ) plus the delay (which is ∆ = δ N ). During this mean time, the membrane voltages, which are still integrating at a rate N τ , rise above threshold by an amount N τ ( 1 N ρ + δ N ) = 1 τ ρ + δ τ . As long as this quantity is less than 1, then on average, delayed inhibition from a single spike is sufficient to end the superthreshold interval. This quantity will be less than 1 at small delays δ, and small mean time 1 ρ to first spike in a single neuron, all relative to the single neuron integration time τ .
In summary, if the population first becomes superthreshold at a reference time t = 0, then: (1) the mean of the time t first to the first network spike is 1 N ρ ; (2) the mean time when the network first receives delayed inhibition from this first spike is 1 N ρ + δ N ; (3) the mean number of extra, or spurious, network spikes during this delay period is λ = ρδ; and (4) the final membrane voltage at the end of this period (right before the inhibition) is 1 τ ρ + δ τ . We will focus our analysis on the following limits. First, in order to focus on regimes of good encoding performance with low error, we assume the mean excess number of spurious spikes λ = ρδ 1. Second, for ease of analysis we focus on the regime in which once the network is superthreshold, a single spike can on average return the network to a subthreshold state, and moreover, the few neurons that do spike during the superthreshold interval and are reset, do not return to a superthrehsold state before the rest of the network becomes subthreshold. For the first condition, we require 1 τ ρ + δ τ = δ τ ( 1 λ + 1) 1 which implies δ τ 1 since we are already assuming λ 1. Furthermore, consider a neuron that fired during the superthreshold state and was immediately reset to a subthreshold state corresponding to a membrane voltage that is below threshold by an O(1) amount. Due to integration of the external stimulus at a rate N τ it will return to a superthreshold state within a time O( τ N ). As long as the delay ∆ = δ N is much less than τ N , the entire population will become subthreshold before this reset neuron becomes superthreshold again. Thus, in the combined limit λ 1 and δ τ , the entire network operates in a simple fashion: all membrane potentials cross threshold together, a single first spike occurs, and then after a delay ∆ the entire network becomes subthreshold again. Then after a time O( τ N ) all the membrane potentials cross threshold simultaneously again. Interestingly as we see below, performance is actually best when the network behaves in this fashion.

A.2 Readout error for the soft-threshold model
First, we integrate Eq. (S3) to write the readout at a given time t as a sum of exponential kernels from all spikes in the past.
Here, ∆t k = t − t k where t k is the time of the k'th spike in the past from any neuron in the network. Note that the time between any successive pair of network spikes scales with N as O(1/N ). To see this, consider two possible cases. First consider the case where the entire population is superthreshold, and a spike occurs. After a delay of ∆ = δ/N , this spike will inhibit the entire network, causing the entire population to become subthreshold. However, before this happens, spikes can occur in any neuron at a rate N ρ and therefore the typical ISI between any pair of spikes that occur in the same superthreshold interval is O( 1 N ρ ). Now consider the the second case: the final spike in one superthreshold interval and the first spike in the next superthreshold interval. The time interval between these two spikes decomposes into 3 parts: (1) the waiting time for network-wide inhibition to return to a subthreshold state; (2) the time it takes for the external stimulus to drive the population back to a superthreshold state; (3) the waiting time t first for the first network spike to occur after becoming superthreshold. For (1), because we are working in a limit where a single spike can typically return the network to a subthreshold state, the waiting time for returning to a subthreshold state is at most the delay ∆ = δ N from the last spike (and is shorter if there were earlier spikes in the same superthreshold event before the last spike). For (2), we note that the mean number of spikes in a superthreshold event is 1 + λ (the first spike plus the mean number λ of spurious spikes). Each of these spikes will eventually decrement the membrane voltage by an O(1) amount. Because λ 1, the membrane voltage of all neurons will never descend below threshold by more than an O(1) amount, and because the membrane voltages integrate the external drive at a rate N τ , they will recover to a superthreshold state in a time O( τ N ). Finally, for (3), the mean of the time t first to the first network spike is 1 N ρ . Thus the sum of these 3 intervals is

A.2.1 Readout at a single point in time
In order to compute σ readout , we need to compute the standard deviation inx(t) over a time window. To do so, we first compute the statistics of the readout at a particular point in time t thres , defined as the time when the entire network reaches threshold together. We denote the readout at this time by ξ :=x(t thres ).
To compute the mean and variance of ξ, we need to understand the statistics of the past times ∆t k of all previous spikes, as these times contribute to the readout through Eq. (S6). Figure A shows a schematic for the trajectory of the population of membrane potentials undergoing the dynamics Eq. (S5). Assuming the neural population starts with initial conditions V i = 0 ∀i, the population reaches threshold together (the first threshold-crossing shown in the schematic). Let z k denote the duration of time between this threshold crossing and the first spike anywhere in the network. As discussed above z k is an exponentially-distributed random variable with mean 1 N ρ . Thereafter, there is a delay time ∆ = δ N during which the other N − 1 ≈ N neurons remain superthreshold and continue to fire probabilistically. In a case where no extra spurious spikes occur, the rest of the membrane potentials rejoin the first-firing neuron's membrane potential after receiving the inhibitory spike. The population then continues toward threshold, and the process repeats. Occasionally (due to our assumption of small λ) one (or more) spurious spikes may occur. For example, in the third superthreshold interval in Figure A, two spikes occur at times z 2 and z 3 after the time the population crosses threshold. Each of these spikes will decrement the membrane voltage of all neurons by 1, and the entire population will recover arriving at the next superthreshold crossing over a time proportional to the total number of spikes in the previous superthreshold interval. (S5) generate a history of spike-times ∆t k relative to the moment the membrane potnetials reach threshold again, t thres . The spike-times include variation z k from the probabilistic first-spike wait-time due to the soft threshold, and the possibility of spurious synchronous spikes during the delay ∆. Bottom: integrating the readout deviation over l-spike events in the computation of σ readout . E(ξ), the mean readout at the time t thresh , is used as a reference point to compute the mean readout and the squared deviation from that mean, integrated over time.
For ease of exposition, let us first compute the mean and variance of ξ in the limit of extremely small λ. In this limit, the most probable scenario by far in each superthreshold interval, is that only a single first spike occurs, with no subsequent extra spurious spikes. Later we will consider corrections due to extra spurious spikes that become relevant at larger λ. In this small λ limit, there is a one-to-one correspondence between individual spikes and the previous superthreshold crossing event of the population. We can use this correspondence to calculate the times t k = t thres − ∆t k of the k spike in the past. First, let c k be the time at which the population crossed threshold just before the time t k of the k'th spike in the past. Thus by the definition of z k we have t k = c k + z k . Furthermore note that for any k the time between successive population superthreshold obeys c k −c k+1 = τ N . To see this, note that by definition, at time c k+1 the membrane potentials are at threshold V i = 1 2 . During the entire time from c k+1 to c k , the membrane potentials are rising at a rate N τ , but they also suffer a decrement of 1 when the neuron that spiked resets its own membrane potential and when the rest of the neurons receive delayed inhibition from this spike. This combination of integration at a constant rate N τ , plus a decrement of 1, implies the next time the population crosses threshold will be at time c k = c k+1 + τ N . This constancy of the time between population superthreshold crossings then implies that c k = t thres − k τ N and so t k = c k + z k = t thres − k τ N + z k , and thus ∆t k = t thres − t k = k τ N − z k . Inserting this formula for ∆t k into Eq. (S6), we obtain We can see that ξ is an infinite sum of uncorrelated random variables. Using the fact that the variables z k τ are exponentially distributed with mean 1 N ρτ , we have where the approximate expressions denote the terms to leading orders in 1 N . We can sum the geometric series, using the independence of the z k , to obtain We note some important properties of E(ξ) and var(ξ). First, to leading order in 1 N , E(ξ) = 1, which is expected for the constant stimulus x(t) = 1. The next leading order term in E(ξ) decreases as O( 1 N ). Interestingly, in the limit of large ρ, the subleading term approaches − 1 2N . This simple behavior can be understood as follows. In the large ρ limit, the network spikes very soon after it crosses superthreshold. This single spike increases the readout by exactly 1/N . The readout subsequently decays at a rate τ but within a time O(1/N ) the membrane potentials recover to be subthreshold. By this time the readout has decayed to below 1 and is ready to be boosted to above 1 by the next spike. Thus, when we measure the readout specifically at the time the population has crossed threshold, before this next spike, the readout will tend to underestimate the input by an O(1/N ) amount, while right after the spike, the readout will overestimate the input by an O(1/N ) amount. In this fashion, the readout will zig-zag about the correct value. In fact it zig-zags symmetrically about the correct value (in the large ρ limit), and is thus 1 2 1 N below 1 at threshold crossing and 1 2 1 N above 1 just after the spike. In fact this zig-zag behavior of the readout cannot be reduced either by reducing delays (by reducing δ) or reducing variability in the time to first spike (by increasing ρ); it remains when δ = 0 and ρ is large.
Further note that the standard deviation of the readout at the time the population crosses threshold (i.e. var(ξ) in Eq. (S15)) is O( 1 N 3/2 ), and so is much smaller than the typical O(1/N ) amplitude of the zig-zag behavior of the readout about the correct value. Thus the variability in spike timing does not contribute appreciable variance to the readout ξ at the single time t thres . This can be understood intuitively, as follows. First, the time to spike after a threshold crossing has variance O(1/N 2 ) because N neurons are racing to spike first. However, even though the decoder only integrates over a time scale τ , the network generates spikes at a rate that is O(N ). Thus the decoder value ξ at time t thres feels variability from the last O(N ) spikes in the past; the variabiilty in the timing of these spikes all contribute to the variance of the readout at t thres . The combination of O(N ) spikes thus leads to an amplification of readout variance by a factor of N due to the addition law of variance of independent variables. But furthermore, the readout averages over N neurons leading to a multiplicative reduction of variance by 1/N 2 . Thus the total variance of the readout at any fixed time t thres due to the combination of decoder averaging, decoder integration, and network racing to first spike is O( 1 Note that we have shown that the standard deviation of ξ, or the readout at time t thres is negligible and only O( 1 N 3/2 ) in the small λ limit where only single spikes in any given superthreshold interval occur. Below, we will show that this conclusion still holds, even at larger λ when multiple spikes could occur with appreciable probability in a single superthreshold interval.

A.2.2 Mean readout error
So far, we have computed the mean and variance of the readout at a specific time t thres . However, our fundamental goal is to compute the mean and variance of the readout across time. In doing so we will find that both the bias and the variance of the readout will be O(1/N ). To see this, consider integrating the readout across a long, contiguous time interval that starts at time t = 0 at the beginning of a superthreshold interval and that ends at the start of a much later superthreshold interval, at a time t end τ /N . Then the mean readout is given by and with this definition, the bias of the readout is The variance of the readout is given by (S17) Importantly, many superthreshold intervals occur in this long contiguous time interval from t = 0 to t end . To see this, recall that each superthreshold interval contains O(1) spikes because the mean number of spurious spikes λ 1, and that a single spike inhibits the membrane potential population by 1. Thus after each superthreshold interval, it takes O(1/N ) time for the population to return to superthreshold because the membrane potentials are driven by an input current N/τ . For t end τ /N then, many superthreshold intervals occur in the time between t = 0 and t end . Now, to evaluate Eq. (S16) and Eq. (S17), we wish to express the trajectory ofx(t) during and between these superthreshold intervals. Thus it is useful to introduce the notion of an "event", which we define to be the interval of time from the beginning of a superthreshold interval to the beginning of the subsequent superthreshold interval. By definition, each event contains a single superthreshold interval at its beginning. During this superthreshold interval, recall that a random number l + 1 spikes occur, where l is a Poisson random variable with mean λ. And after the superthreshold interval and before the end of the event, no further spikes occur since the population is subthreshold. Thus we can categorize events with the following nomenclature: events containing 1 spike are referred to as 1-spike events, events containing 2 spikes are referred to as 2-spike events, and so on. We can use this definition of event to simplify our primary task of integrating over the long time interval from t = 0 to t end in Eq. (S16) and Eq. (S17) because the trajectoryx(t) can be represented as a sequence of events-one event starting after another, stitched together-and thus the integrals in Eq. (S16) and Eq. (S17) can be subdivided into a sum of many smaller integrals, with one integral for the duration of each event. Let M be the (large) number of events in the long time interval from t = 0 to t end , then Eq. (S16) can be written as where is the readout integrated from the i'th event's beginning, t i , to its end, t i+1 .
(The beginning of a subsequent event is the end of the preceding event, and the last event ends at time t M +1 = t end .) Similarly, Eq. (S17) can be written as Now we must compute the sums of µ i and s i in Eq. (S18) and Eq. (S20), and to help us do so, we will take advantage of an important property ofx(t) that we derived earlier. Namely, that the value of the readout at the time of threshold-crossing ξ :=x(t thres ) has O(1/N 3/2 ) fluctuations. Since we will soon see that the variance of the readout σ readout is O(1/N ), which is large compared to O(1/N 3/2 ), we can consider the value at the readout at threshold-crossing as effectively fixed at the value E(ξ). And since every event begins and ends at threshold-crossings, the readout at the beginning and end of each event is effectively fixed at the value E(ξ). Thus each µ i and s i do not depend on the history ofx(t)-the value of the readout at the endpoints of integration in Eq. (S19) and Eq. (S21) are always E(ξ) (x(t i ) = E(ξ), ∀i), and the spike times in an event, relative to the start time of the event, are independent of all other events, as the superthreshold interval in the event produces spikes with spiketime distribution independent of other superthreshold intervals. Thus the µ i are independent random variables and the s i are independent random variables. (µ i is independent of µ j and s j ∀j = i, and s i is independent of µ j and s j ∀j = i; and of course, µ i is correlated with s i ∀i, because they are from the same event.) With this independence, we are free to perform the sums in Eq. (S18) and Eq. (S20) in any order. We take advantage of this to group the events by their spike number, which we will see makes a further simplification. We write is a Poisson distribution with mean λ-the probability for an event to have l + 1 spikes. Here, we have written that there are M p(l)+O( √ M ) (l +1)-spike events for each l = 0, 1, ... because the number of l + 1 spike events is a random variable with mean M p(l) and a deviation of order O( √ M ), as M is large, and we have defined the random variables µ i (l) and s i (l) to have the same distribution as µ i and s i , but conditioned on the number of spikes in the event i being l + 1. (The O( √ M ) deviation can be intuitively understood as the standard deviation from a binomial distribution, where a successful trial is an event having l + 1 spikes, and a failure as having any other number of spikes.) Importantly, each µ i (l) and s i (l) maintain independence from the other events because the conditional distribution is simply nomenclature. Since the µ i (l) are independent random variables, we recognize the sum where the angle brackets · i denote average over the distribution of µ i (l). A parallel argument holds for s i (l), and we can write which distills our task down to calculating µ i (l) i , s i (l) i , and t end .
Since we had replaced the integral over the long time interval from t = 0 to t end in Eq. (S16) and Eq. (S17) with a sum over M events, we also need to express the normalizing fraction 1 t end in terms of M so that we can eventually cancel out M that appears in Eq. (S25) and Eq. (S26)-M should not appear in the final result because we introduced it simply to express the long time window over which we are averaging. Fortunately, it is easy to express t end by recalling that there are M p(l) + O( √ M ) (l + 1)-spike events ∀l in the long time interval from t = 0 to t end , and that each l + 1 spike event has duration (l+1)τ /N -each spike inhibits the membrane potential population by 1, yielding a total inhibition of l + 1, and the input current N/τ constantly drives the membrane potentials back to threshold, counteracting the inhibition, ending the event when threshold is reached again at a time (l + 1)τ /N after the start of the event. Thus we can simply sum up the duration of all events, to express t end = ∞ l=0 (M p(l) + O( √ M ))(l + 1)τ /N , and we can substitute this expression into Eq. (S25) and Eq. (S26) to yield Now we can see that dividing both the numerator and denominator by M makes the M disappear from the expression: the terms with factor M p(l) become M p(l)/M = p(l), and the terms of , which go to zero because M is large; thus dividing numerator and denominator by M yields Finally, it remains to calculate the expectation values µ i (l) i and s i (l) i , which we recall are averages over the random variables Eq. (S19) and Eq. (S21), conditioned on there being l + 1 spikes in the event in the time interval t i to t i+1 in Eq. (S19) and Eq. (S21). Let us consider an instantiation of an l + 1spike event, compute µ i (l) and s i (l) for this event, and then consider taking an average over instantiations of l + 1-spike events to compute µ i (l) i and s i (l) i . Recall that for an instantiation of an l + 1-spike event, the readoutx(t) has value E(ξ) at the beginning and end of the event, that the time to first spike in the superthreshold interval at the beginning of the event is a exponentiallydistributed random variable z 0 with mean 1/N ρ and standard deviation 1/N ρ, and that l spikes occur during the delay ∆, after the first spike. We illustrate the readout's trajectory during this event in Figure A where t i is the start time of the event). Thus to first order in time, the readout decreases linearly during the event (with the exception of discontinuous increases due to spikes), with slope −(E(ξ) + O(1/N ))/τ , which is −1/τ to leading order in N , because E(ξ) is 1 to leading order in N (Eq. (S12)). We do not consider higher-order corrections to this approximately linear decrease in the readout, because the constant slope −1/τ provides consistency with our assumption that the readout at the beginning and end of the event, x(t i ) andx(t i+1 ), are effectively fixed at the expectation value of ξ-the constant slope of −1/τ over the duration of the event (l + 1)τ /N perfectly balances the contribution (l + 1)/N to the readout from the (l + 1) spikes, i.e. 0 = (l + 1)/N − 1/τ × (l + 1)τ /N , returning the readout to E(ξ) at the end of the event-and the constant slope is sufficient to find O(1/N ) contributions to σ readout as we will see.
Second, the standard deviation of the first-spike time t i + z 0 (where t i is the start time of the event) is much larger than the variability of the l spurious spike-times that occur during the delay ∆; importantly, this fact will allow us to collapse the spike-times of the l spurious spikes to a single time for the purposes of calculating the most salient contributions to x(t) t and σ readout , as a more finely-timed treatment would only provide higher-order corrections. To see why this fact is true, recall that we are working in the λ = δρ 1 limit. Dividing both sides of this inequality by N ρ, we arrive at δ/N 1 N ρ , and substituting in ∆ = δ/N , we have ∆ 1 N ρ . Recalling that the standard deviation of z 0 is 1/N ρ, we see here that the standard deviation of z 0 is much larger than the delay ∆. Now, while the first spike time is exponentially distributed, for the spike-times of the l spurious spikes we must remember that the l + 1 spike event is an event that is conditioned on having l spurious spikes occur during the delay ∆. (This assumes, as we did earlier, that we are working in a regime in which the first spike's inhibition is sufficient to return the population to subthreshold after the delay ∆, preventing the possibility of further spikes until the population reaches threshold again, which marks the end of the event.) Thus these l spikes are not exponentially distributed like the first spike-indeed, they must occur between time t i + z 0 and t i + z 0 + ∆. Regardless of the spike-time distribution in this interval, the l spurious spikes have spike-times that are bounded in a time interval of width ∆, which is much smaller than the standard deviation of z 0 . Thus the chief contribution to spike-time variability for the l spurious spikes comes from z 0 , and we can collapse the l spikes to the same spike-time t i + z 0 , which we illustrate in Figure A (bottom).
Using these features together-the constant slope of the readout decay and the collapsing of the l spurious spike-times-we can write down µ i (l) and s i (l) for an instantiation of an l + 1 spike event: where the first integral is from the beginning of the event to the time of the first spike, and the second integral is from the time of the first spike to the end of the event. We next want to average over instantiations i to obtain µ i (l) i and s i (l) i . Importantly, recall that z 0 is an exponentially distributed random variable with mean and standard deviation 1/N ρ, so we must average over this distribution. Averaging over this distribution yields Now we can calculate the mean readout x(t) t by substituting the expression for µ i (l) i (Eq. (S33)) into the expression for x(t) t (Eq. (S29)), which yields And then we can calculate the variance of the readout σ 2 readout by substituting the mean readout (Eq. (S35)) into the expression for s i (l) i (Eq. (S34)), and substituting Eq. (S34) into the expression for σ 2 readout (Eq. (S30)), which yields where we have used λ = ρδ to substitute ρ for λ δ . We can find the optimal noise level and minimal σ readout by differentiating σ readout with respect to λ and setting the derivative equal to zero. This yields We corroborate these results with simulations ( Figure 2).

A.2.3 Additional variance from random interspersing of 2-spike events
Importantly, we had assumed in our calculations for the mean readout x(t) t and the readout fluctuations σ readout that the variance of the readout at the time of threshold-crossings ξ :=x(t thresh ) is to leading order var(ξ) = O(1/N 3 ); we used this fact to consider the value of the readout at the start and end of events as effectively fixed, compared to the leading order fluctuations in σ readout , which were O(1/N ). However, we had only shown var(ξ) = O(1/N 3 ) if one considers that only 1-spike events occur, and no l + 1 spike events occur (where l > 0) (the derivation leading to Eq. (S15)). So it remains to confirm that, when there is an appreciable probability that l + 1 spike events occur (where l > 0), the variance var(ξ) To see why this is the case, we first imagine a readout trajectoryx(t) that consists only of 1-spike events, of which we know var(ξ) = O(1/N 3 ); then we modifyx(t) so that it includes l-spike events, calling this new trajectorŷ x(t); and finally we calculate how much the variance var(ξ) of the readout at threshold-crossingξ :=x(t thres ) changes under the perturbationx(t) →x(t), compared to the original variance var(ξ).
Recalling that the mean number of spurious spikes λ 1, the most prevalent type of event is a 1-spike event (0 spurious spikes), and the second-most prevalent type of event is a 2-spike event (1 spurious spike). Thus we can consider the perturbationx(t) →x(t) to leading order in λ by only considering the addition of 2-spike events. Now let us consider what adding 2-spike events looks like. The readout trajectoryx(t) is a sequence of 1-spike events, while the readout trajectoryx(t) is a sequence of 1-spike events and 2-spike events, with 1-spike events accounting for the fraction p(0) = 1 − λ + O(λ 2 ) of all events and 2-spike events accounting for the fraction p(1) = λ + O(λ) 2 of all events. Importantly, the few 2-spike events are randomly interspersed among the many 1-spike events, as we recall that the number of spikes in a given event is independent of all other events. Thus, we can imagine adding 2-spike events tox(t) by randomly replacing a fraction λ of (randomly selected) 1-spike events with 2-spike events. Now importantly, 2-spike events have duration 2τ /N , while 1-spike events have duration τ /N , which means replacements must shift other events (events before the replacement to earlier times and/or events after the replacement to later times) in order to create room for the longer 2-spike event. Thus instead of considering this complicated dependency of the other event times due to each single replacement, let us instead consider replacing subsequent pairs of 1-spike events with 2-spike events; a subsequent pair of 1-spike events has duration 2τ /N , which matches that of a 2-spike event, and thus the other event times do not change when such a pair replacement is made.
To understand how many such replacements are necessary to result in a fraction λ of 2-spike events inx(t), consider starting with a large number M of events in a long time interval ofx(t) as we did earlier. We wish to perform replacements of 1-spike events inx(t) so that the fraction of 2-spike events after replacementsx(t) is λ. The question is-how many 1-spike pair replacements should we make? Let us consider replacing K 1-spike event pairs. The number of 1-spike events is initially M (there are only 1-spike events inx(t)). Replacing K 1-spike pairs removes 2K 1-spike events and introduces K 2-spike events. Thus the number of 1-spike events becomes M − 2K and the number of 2-spike events becomes K. We wish for the fraction of 2-spike events K M −2K to be λ. Solving for K, we find K = M λ + O(λ 2 ). This means that we must replace a fraction K/M = λ+O(λ 2 ) pairs of 1-spike events to make the fraction of 2-spike events to be λ. Now that we know the fraction of 1-spike event pairs to replace with 2-spike events to represent the perturbationx(t) →x(t), we need to consider what the effect such a perturbation has on var(ξ) − var(ξ). First, let us consider the replacement of one pair of 1-spike events, whose end is at a time t replace , which is nτ /N before the time t thresh (t replace = t thresh − nτ /N ). Recall that the replacement only changes the spike times of the 2 spikes in the pair of 1spike events, and the spike times in all other events remain the same. The perturbation in the readout at t thresh ,ξ − ξ, is where we have neglected any differences due to first-spike times because they are much smaller than the change due to the replacement. (The first-spike times in the original 1-spike events each have standard deviation 1/N ρ, which is much smaller than the O(τ /N ) jump in spike-time due to replacing the pair of 1-spike events with a 2-spike event. As discussed earlier, we are considering the regime in which a single spike's inhibition is sufficient to return the entire population to subthreshold, which means 1/N ρ τ /N , and thus all spikes happen near the beginning of an event-the second spike here, which is originally in the second 1-spike event, now takes place near the beginning of the 2-spike event, being Importantly, individual replacements are probabilistic, happening with probability λ for each pair of 1-spike events. Thus, now that we know the differencẽ ξ − ξ due to a replacement, we can use it to express how a single probabilistic replacement effects var(ξ) − var(ξ). Consider any pair of 1-spike events inx(t) (indexed by n in their end time t replace = t thresh − nτ /N ); the probability that this n'th pair is replaced by a 2-spike event is λ. Thus the change in mean readout at t thresh , E(ξ) − E(ξ), due to this single probabilistic replacement is where the change d(n) due to the potential replacement is weighted by its probability λ. Furthermore, the contribution to the variance in the readout at t thresh , var(ξ) − var(ξ), due to this single probabilistic replacement is where we have used the fact that the probabilistic replacement is independent of other sources of variance. Finally, we must sum over all probabilistic replacements to express the total effect of 2-spike events on var(ξ) − var(ξ). We can sum the additional variance from each probabilistic replacement var(d(n)), where each pair of 1-spike events considered for replacement is indexed by n in their end time t replace = t thresh − nτ /N : where we have safely neglected the low-probability event that two pairs of 1-spike events that share a 1-spike event (subsequent pairs of 1-spike events, overlapping in time) are both replaced, which has probability O(λ 2 ). We can see that the leading-order term in Eq. (S45) is O (1/N 3 ), which completes the argument that var(ξ) = O(1/N 3 ) because we had already shown that the variance considering 1-spike events only, var(ξ), is O(1/N 3 ).

A.3 Readout error for the LIF model with zero delay
Our ultimate goal is to study the standard deviation σ readout as a function of the delay ∆ and noise σ in the membrane potential dynamics Eq. (S1); as mentioned in Models, our study will uncover an upper-bound for σ readout as a function of delay ∆ and noise σ. But for ease of exposition, let us first study σ readout in the case of zero delay ∆ = 0, with noise σ > 0. In this case, the membrane potentials undergo the dynamics Studying this delay ∆ = 0 case, we will uncover properties of the membrane potential dynamics Eq. (S46) that will serve as a baseline for studying the dynamics when there is a delay ∆ > 0 (the dynamics in Eq. (S1))-we study ∆ > 0 in the next section.
To understand the behavior of the population of membrane potentials V i in Eq. (S46), consider for the purpose of pedagogy the situation in which the membrane potentials start with some initial condition V i (0), and inhibitory spike terms are disabled (o j (t) terms are removed, and the neurons never spike, even when they surpass threshold). The dynamics become These dynamics correspond to an OU process [1, 2]-each membrane potential V i (t) undergoes an independent OU process with time constant τ , leak constant λ V , drift term N , and independent noise √ τ ση i (t). In particular if the process starts at a well specified initial condition V i (0), then its mean E(V i (t)) evolves in time as Notably, the initial condition V i (0) is exponentially forgotten on a time scale of τ λ V and the external drive N forces the mean to saturate at the value N/λ V , again over a time scale of τ λ V . Importantly, however, the temporal autocovariance of the stochastic process V i (t), is independent of the drift term: where s and t are two times and the angle brackets · denote an average over the stochastic trajectory η i . The second term e − λ V τ (t+s) reflects the non-stationarity of the autocovariance due to the definite initial condition at time 0, and for large times s, t τ this non-stationary term vanishes. Let us consider large times, as we are interested in the setting in which the network is continuously operating. And for a time s = t, the first term e − λ V τ |t−s| is 1, showing us that for a particular large time t τ , the variance of Thus, for an instant in time t, each V i (t) is a random variable with mean E(V i (t)) and variance var(V i (t)) = σ 2 2λ V , which we have also named σ 2 OU for a more brief notation in future calculations. Furthermore, it is known that V i (t) is Gaussian-distributed (a property of the OU process). Thus a simple picture emerges here-for any moment in time t τ , the membrane potentials are Gaussian-distributed with variance σ 2 2λ V , forming what we call a "packet" that is centered at E(V i (t)) = N/λ V . Of course, the individual membrane potentials fluctuate as time goes on, but the packet remains the same size, with variance σ 2 OU = σ 2 2λ V . Next, let us consider reintroducing the inhibitory terms o j (t), and describe the dynamics Eq. (S46). Compared to our preceding description of the dynamics without inhibitory terms, we see one difference: the drift term N is now counteracted by the inhibitory spike term −τ . Namely, that this equation does not depend on the initial condition V i (0). Then imagine the following: the packet of membrane potentials evolves according to our previous description, until one neuron reaches threshold. This neuron fires a spike, and, importantly, this spike is instantaneously delivered to all other neurons through the −τ N j=1 o j (t) term (as the delay ∆ = 0), which simultaneously decrements the entire membrane potential packet. Since the decrement is simultaneous, the relative positions of the membrane potentials in the packet are preserved over that instant in time (but of course, the mean membrane potential is decremented). Thus the development of the autocovariance of the packet is preserved (it continues to grow or saturate to a steady state as in Eq. (S50)) because we can consider the new position of the packet after the spike to simply be a new set of initial conditions, and the autocovariance does not depend on the initial conditions. Of course, the mean membrane potential oscillates in a zig-zag fashion, driving upward due to the drift term N and being decremented with each spike, but the packet eventually grows to the same steady-state variance of σ 2 OU = σ 2 λ V . Thus to summarize the dynamics of Eq. (S46), we have a Gaussian packet of membrane potentials V i (t) rising toward threshold at a rate N/τ (driven by the input current N ) and discouraged from deviating from zero due to the leak term −λ V V i (t). Since we are interested in an operating regime where the network dynamics are driven by tight balance (the input current N term), let us assume the leak term −λ V V i (t) is small compared to the input current N , i.e. |λ V V i (t)| N ; thus the input current N surely drives the packet to threshold. When the top neuron in the packet hits threshold at a time we call t thresh , it fires a spike that instantaneously decrements the entire population of membrane potentials by 1 (the −τ N j=1 o j (t) term integrates to 1 because of the time constant τ multiplyingV i (t) in Eq. (S46)). Then, the membrane potential packet continues traveling toward threshold, and the process repeats. Note that in this repeated process, the same neuron does not necessarily fire successivelythe membrane potentials continuously fluctuate within the packet, and thus a different neuron can become the top neuron in the membrane potential packet at any time.
Importantly, we wish to consider high-performing networks, and it is helpful to identify what regime of parameters facilitates high performance, i.e. small readout standard deviation σ readout . First, let us consider the noise level σ. The width of the membrane potential packet σ OU is controlled by σ through σ OU = σ 2 /2λ V . Thus for small σ, the packet width is small, and for larger σ, the packet width is large. Now, to understand how the width of the packet affects the readout standard deviation σ readout , let us consider first small σ, such that σ OU 1. In this case, the fluctuation in the membrane potential of the top neuron in the packet is small because the packet itself is tight. Thus the time between spikes ∆t from when the top neuron, say, neuron i, spikes at a time t thresh,1 , and when the top neuron, neuron j (neuron j is not necessarily the same neuron as neuron i), subsequently spikes again at time t thresh,2 (∆t := t thresh,1 −t thresh,2 ) has very little variation. To see this, recall that the inhibition from the first spike at time t thresh,1 inhibits the membrane potential packet by 1. The packet then travels toward threshold due to the input current N (recall that we can ignore the leak term −λ V V i (t) N ). Now, if the packet is infinitely tight, i.e. σ OU = 0, then it takes a time ∆t = τ /N for the population to reach threshold again at time t thresh,2 , with no fluctuations in the time ∆t. (a point packet is travelling toward threshold with constant current N and repeatedly inhibited) This results in perfectly regular interspike intervals of τ /N , which correspond to an the ideal zig-zag readout as discussed in the large ρ limit of the soft-threshold model-the best performance achievable, given that the readout is defined as a sum of firing rates that are filtered spike-trains Eq. (S2). Next, consider a small σ > 0, in which σ OU > 0. Now the fluctuations in the membrane potential of the top neuron can be significant because the membrane potentials fluctuate within a packet of finite width; thus variations are introduced in ∆t. These variations are a departure from the perfectly regular spike-times of the σ = 0 case, thus they increase σ readout . Thus we find here that high-performing networks correspond to low levels of noise σ.
Importantly, the magnitude of the fluctuations in the interspike interval time ∆t is controlled by the standard deviation of the membrane potential packet σ OU = σ/2 √ λ V . Thus another way to achieve consistent interspike interval times, aside from small σ, is to choose large λ V because this will make σ OU small as well. But we cannot adjust λ V arbitrarily, as we recall that we would like to consider values of λ V that correspond to the regime in which the network dynamics are dominated by tight balance, i.e., |λ V V i (t)| N . Now, importantly, we are considering σ OU 1, and the membrane potentials V i (t) exhibit O(1) fluctuations due to decrements from inhibitory spikes when the top neuron reaches threshold-thus the combined effects of the packet's width σ OU 1 and the inhibitory spikes create an overall membrane potential dynamics that exist in an O(1) dynamic range. Hence, the leak term N , so long as λ V scales less than O(N ). Let us assume that λ V is a given property of the network dynamics that satisfies this constraint.
Equipped with an understanding of the membrane potential dynamics Eq. (S46) and a qualitative understanding of how noise σ affects the dynamics, let us work toward our goal of quantitatively expressing the readoutx(t)'s standard deviation σ readout as a function of the noise σ for a network with some given leak parameter λ V . We start by recalling that the readoutx(t) is a sum of decaying exponentials, with each term corresponding to a spike-time in the network: Here, we have written ∆t k := t − t k where t k is the time of the k'th spike in the past from any neuron in the network. For ease of exposition, it is useful to define the notion of an "event" here as the interval of time between two subsequent spikes. Thus the time interval from t 2 to t 1 is an event, the time interval from t 3 to t 2 is an event, and so on. (Note that this is the same definition of "event" as in our description of the soft-threshold model, because spike-times are equal to membrane potential threshold-crossing times in the LIF model; also note that we only have 1-spike events here, as there are no delays.)

A.3.1 Readout at a single point in time
Now, we eventually want to compute the integral ofx(t) over time to determine σ readout (Eq. (S17)), but to begin let us consider the readoutx(t) at just one point in time, t mid := t 1 + τ 2N , that is near the middle of the event that starts at time t 1 . Let us call the value of the readout at this time ξ :=x(t mid ). To see why this time t mid is near the middle of the event, consider the typical duration of an event-the membrane potential packet has been inhibited by 1 from the spike at the beginning of the event, and the driving current N takes a time τ /N to push the packet back to threshold, if we ignore fluctuations in the membrane potential packet. Thus, t mid = t 1 +τ /2N is about halfway through the duration of the event. Now importantly, the spike times t k are random variables that depend on the fluctuations of the membrane potentials due to the noise term √ τ ση i (t) in Eq. (S46), and so ξ is also a random variable that depends on the the spike times ∆t k through Eq. (S53). Thus we need to study the spike times ∆t k . To simplify our analysis, let us first consider small λ V , where λ V 1. To understand how small λ V simplifies the dynamics of the membrane potentials, first consider Eq. (S50). Notice that λ V sets the time-scale of autocorrelations in membrane potential trajectory V i (t) to τ /λ V , as seen in the first term of Eq. (S50). Thus a small λ V corresponds to a long mixing time τ /λ V of the membrane potential packet. In essence on time scales much less than that of τ /λ V , the relative order of membrane potentials will be preserved. Note also that the time it takes for the mean of the packet to travel to threshold right after experiencing inhibition is τ /N . Thus if the mixing time τ /λ V is much greater than the time to rise to threshold τ /N , from event to event the identity of the top-neuron that first spikes will largely be preserved for successive events. We can see this effect in spike raster plots from simulations of the membrane potential dynamics Eq. (S46) (Figure 3(a)). Notably, as λ V is decreased, the same neuron repeatedly spikes; i.e., the same neuron remains the top neuron in the packet over many events. Thus for small λ V N , we can approximate the spike-time generation dynamics by only considering the dynamics of a single neuron's membrane potential-that of the top neuron.
Considering just the single membrane potential of the top neuron evolving according to Eq. (S46), we can easily calculate the statistics of event durations, i.e., the time between the last spike of the top neuron and its next spike. This will in turn allow us to calculate the statistics of ∆t k in Eq. (S53). Let us consider an event that starts at time t s and ends at time t f , and let us call the membrane potential of the top neuron V α (t) (α is the index of the top neuron). At the instant of t s , the membrane potential V α (t) has just reached threshold, the top neuron spikes, and the membrane potential V α (t) is decremented by 1 by the inhibitory spike. Next, the membrane potential V α (t) is driving back toward threshold by the input current N , and experiences two other effects during its journey: the leak −λ V V α(t) and the noise ση α (t). Since λ V is small and V α (t) is O(1), we can neglect the leak term. But the noise term √ τ ση α (t) is accumulated in the integration of Eq. (S46) while the membrane potential V α (t) travels toward threshold, and the total time t f p := t f − t s it takes for the top neuron to reach threshold is a random variable. Fortunately, we can recognize t f p as the first-passage time of a particle undergoing Brownian motion with drift N , noise level σ, time constant τ , and with a goal that is a distance 1 away. The moments of t f p are known [2]. The mean is and the variance is Furthermore, the duration of the next event is simply another instance of the random variable t f p , as the history of previous events is not recorded in any way in the state of the top neuron. Thus, defining the random variable t n f p as the time between spike time t n+1 and spike time t n , i.e., t n f p := t n − t n+1 , we can write the spike times ∆t k in Eq. (S53) a time t = t mid as where by definition t mid − t 1 = τ /2N . Now that the ∆t k are expressed as sums of random variables t n f p , we can consider evaluating the readout at time t mid , ξ =x(t mid ). We rewrite Eq. (S53) as Here we can recognize that ξ is a sum of correlated random variables, because the exponent of the k'th term is a sum of the first k − 1 t n f p s; for instance, the k = 2 term contains t 1 f p , and all k > 2 terms also contain t 1 f p . Furthermore, the first-passage times t n f p are not Gaussian-distributed [2]. Thus evaluating this sum in non-trivial, in contrast to our analysis for the soft-threshold model. However, to make evaluating this sum more tractable, we note one important property: in the vast majority of e −(...) terms, the sum n=1 t n f p as a Gaussian distribution. Furthermore, this Gaussian distribution only depends on the mean and variance of the t n f p . Of course, even with this simplification, we still need to take into account that the e −(...) terms are correlated.
To formalize our central limit theorem approximation, let us rewrite Eq. (S58) as where is a zero-mean Gaussian-distributed random variable with variance equal to that of t n f p . (We pulled out the mean of t n f p to create the τ /N term in Eq.(S59)). Now recall that we are ultimately interested in computing the variance of the readout σ 2 readout , and so as a starting point, let us compute the variance of the readout at the particular time t mid , σ 2 ξ := (ξ − ξ ) 2 . We begin calculating σ 2 ξ by making some simplifications. First, we pull out the common factor for all terms e −τ /2N , and we substitute in the simple relation Second, we note that the sum h(k) := k−1 n=1 γ n has zero terms for k = 1, i.e., h(1) = 0 n=1 γ n = 0, one term for k = 2, i.e., h(2) = 1 n=1 γ n = γ 1 , and so on. Thus there is a mismatch between the term-number k and the number of γ n in the sum for term k. We would like to remove this mismatch to make the expression simpler, so we therefore pull out the first k = 1 term in Eq. (S61), and re-choose the index k → k − 1 so that it now starts at k = 1 with the term that contains one γ n (that is, γ 1 ). This results in an equivalent expression to Eq. (S61), Now, we can see that the first term in Eq. (S62) is constant and not a random variable, so the variance σ 2 ξ does not depend on it, thus we can neglect it in our calculation of σ 2 ξ . Thus our goal is to compute the variance due to the other terms in Eq. (S62). Importantly, the e −1/2N that we factored from all terms is simply a multiplicative factor, thus if we define then the variance of ξ is the variance of ζ times (e −1/2N ) 2 , i.e., σ 2 ξ = ( 1 N e −1/2N ) 2 σ 2 ζ . Keeping this conversion from σ 2 ζ to σ 2 ξ in mind, we can turn our focus to computing σ 2 ζ . Looking at Eq. (S63), we need to understand how we will compute the variance of a sum of correlated random variables. We note that we can break down the k'th term into a factor e −k/N multiplying the random variable e − k n=1 γn/τ . And we can see that the random variable e − k n=1 γnτ is log-normal distributed, because the γ n are Gaussian-distributed and so is their sum. Thus we see that we have a sum of weighted, correlated log-normal random variables. Importantly, the variance of the sum depends on the variance of the individual correlated random variables (and how they are correlated), thus it is useful to introduce here a definition for the variance σ 2 0 of the log-normal random variable e −γn/τ : Now, we can actually compute the variance of σ 2 ζ by adding variances and splitting according to the contributions from each γ n to account for correlations. To do this, we introduce a simple approximation that is accurate for small σ 1; namely, that the variance of the random variable e − k n=1 γn/τ is approximately equal to the variance of the random variable k n=1 e −γn/τ for all k. Thus, using this approximation as we consider the contribution to the variance from each term in ζ, we encounter a sum of now uncorrelated random variables, with coefficients determined by the term number k. Namely, the first few terms when considering the variance of ζ are e −1/N e −γ1/τ , e −2/N (e −γ1/τ + e −γ2/τ ), and so on. Thus we can sum the coefficients for each independent random variable (each γ n ), and add the coefficient sums in quadrature to yield the variance σ 2 ζ . We write where we have introduced the definition for brevity. Next, to simplify Eq. (S65), we can recognize that the sums ∞ k=j r k are geometric series, whose totals are r j 1−r and write And factoring out r 2 (1−r) 2 , we see that the result Eq. (S67) is itself a geometric series, with constant ratio r 2 , which we can substitute for its total 1 1−r 2 , writing Thus finally we can substitute back in the definitions for r and σ 2 0 into Eq. (S69), and recall the conversion σ 2 ξ = ( 1 N e −1/2N ) 2 σ 2 ζ to obtain

A.3.2 Mean readout error
To summarize the result Eq. (S71), it tells us that the value of the readout at a time τ /2N into an event,x(t mid ), is a random variable with standard deviation σ/ √ 2N to leading order. This is a useful fact, because it gives us a starting point for calculating σ readout , as we now illustrate. Recall from our discussion of the soft-threshold model, that we can express the trajectory ofx(t) as a series of events (here events go from spike to subsequent spike, while in the soft-threshold model events went from superthreshold crossing to superthreshold crossing), and we can express the integrals over time that define the mean readout x(t) and the variance of the readout σ 2 readout as sums of of smaller integrals over the duration of individual events (Eq. (S18), Eq. (S19), Eq. (S20), and Eq. (S21)). But importantly, in contrast to the soft-threshold model, the value of the readout x(t) during each event is not independent of all other events; in fact, for the LIF model, our calculation for Eq. (S71) shows spike-time variability accumulated over many spikes (the Recognizing this important property that the readoutx(t) is correlated from event to event, we can still however use our result for σ ξ (Eq. (S71)) to compute σ readout because we need to take a sum over many events, and addition is commutative. The sum for σ readout (Eq. (S20)) can be executed in any order-thus all we need to know is the distribution of the integrated squared deviation s i for an individual event.
Thus, we can perform a parallel computation for σ readout to that in the softthreshold model (Equation Eq. (S18) to Eq. (S37)), except here we only have 1-spike events. Eq. (S29) becomes and Eq. (S30) becomes where we recall that the denominator in these equations expresses the mean duration of an event, which for LIF model we recall is simply t f p = τ /N . To express the expectation values µ i i and s i i , we can first consider a particular instance of an event, integrate over the duration of this event to compute µ i and s i , and then take the average over i to yield µ i i and s i i . For a particular event then, we can write where ξ i is an instance of the random variable ξ, and the time t starts at the beginning of the event with t = 0. The expression in () in Eq. (S74) and Eq. (S75) is the readoutx(t ). Importantly,x(t ) starts with value ξ i + 1/2N at t = 0, and decays approximately linearly with slope −1/τ (by the same reasoning for the constant slope of −1/τ in the soft-threshold model). Thus at time t = τ /2N , the readout equals ξ i , as the definition of ξ i requires. Finally, the readout continues to decay until the event ends at time t = t i f p . To evaluate µ i i , let us represent the deviation of ξ i from E(ξ) with the unit Gaussian random variable y i multiplied by σ ξ . Eq. (S74) becomes and its expectation value (averaging over y i and t i f p ) is because y i has zero mean, the integral from t = 0 to the mean t i f p at t = τ /N evaluates to exactly E(ξ)τ /N , and any additional contributions from the deviations of t i f p from its mean τ /N are order O(1/N 3/2 ) (Eq. (S55)). Thus the mean readout Eq. (S73) to leading order is This result can be understood intuitively by comparing to the nominal operation of the predictive coding framework (the large ρ limit discussed in the softthreshold model). Intuitively, for high-performing networks (low noise), the network produces perfectly regular spikes, and thus the decaying readoutx(t) ≈ 1 by tracing a zig-zag around the encoded variable x(t) = 1. Since each spike contributes 1/N to the readout, the value of the readout immediately before each spike is 1 − 1/2N , so that when the spike arrives, the readout is 1 + 1/2N . This centers the zig-zag perfect aroundx(t) ≈ 1, and notably, since the time between spikes is short (O(1/N )), the readout decays linearly to 1 at the mid-point time between subsequent spikes. Hence, x(t) t = E(ξ) for high-performing networks. Next, to evaluate s i i , let us again introduce the random variable y i and substitute into s i Eq. (S78) to obtain Performing this integral and taking the expectation value over i (averaging over y i and t i f p ), we obtain to leading order where we have again neglected any higher-order contributions from the deviations of t i f p from its mean, t f p = τ /N . Substituting the leading-order term in this expression into Eq. (S73) and taking a square root, we finally obtain Now importantly, we recall that we made two important assumptions to arrive at this result. The first was that λ V 1, and the second was that the sum of spike-time variations k n=1 γ n could be well-approximated as Gaussian with the central limit theorem and that any deviations from this approximation are small because only a few sums k n=1 γ n have a small number of terms (small k, while in the calculation, k ranges from 1 to ∞). Thus for small λ V , we evaluate our Gaussian approximation by comparing our result Eq. (S82) against Monte Carlo simulations that only contain the top neuron ( Figure B). Secondly, to understand how our result relates to networks with larger λ V -when the top neuron alone is not sufficient to describe the network's dynamics-we perform simulations with a variety of values of λ V . Interestingly, we find empirically that Eq. (S82) provides an upper-bound for σ readout (Figure 3c). This upper-bound is an important result that will be used in the next section.  S4)) when the t i f p s are drawn from the firstpassage-time distribution of an OU process (red lines of different shades for different λ V ), appear to match σ readout when the t i f p s are drawn from a Gaussian distribution with moments from the first-passage-time distribution of Brownian motion (light blue lines, redundant simulations; black line, Eq. (S82)). These single-neuron simulations are equivalent to assuming the top neuron in the membrane potential packet remains the top neuron forever, like in the λ V 1 limit in Figure 3a.

A.4 Readout error for the LIF model with delay
In the previous section, we studied the dynamics Eq. (S1) with no delay (delay ∆ = 0) to determine the readout standard deviation σ readout as a function of the noise level σ. This provides a good starting point to determine σ readout as a function of both σ and nonzero delay ∆ > 0. To begin, let us consider the regime of parameters we would like to consider; in particular, we are interested in studying networks that exhibit high performance, i.e. small σ readout , and networks that are driven by tight-balance, i.e., the leak term −λ V V i (t) in Eq. (S1) is small compared to the driving current N . As discussed in the previous section, this regime corresponds to noise level σ 1, and we consider λ V as a given network property that satisfies the constraint that the leak term is small. Thus it remains to consider what values of delay ∆ correspond to high performance. Intuitively, from our analysis of the soft-threshold model, we understand that small delays correspond to high performance-small delays allow one neuron to quickly inhibit the others, and this helps prevent spurious spikes that would otherwise increase readout error. Therefore, we consider small delays ∆ for the LIF model. And more precisely, for ease of analysis, we will consider delays that are much smaller than the typical time between subsequent network spikes. The typical time between subsequent network spikes is t f p = τ /N (see previous section), and so we will consider delays ∆ = δ/N , where δ τ .
To understand how the membrane potentials evolve in the presence of delay ∆ > 0, let us first recall our description of the dynamics when ∆ = 0 from the previous section. The membrane potentials travel in a Gaussian packet of width σ OU = σ/ √ 2λ V toward threshold, driven by the input current N . When the top neuron in the packet hits threshold it fires a spike, and this spike then propagates to the other neurons, decrementing their membrane potentials, and thus prevents the other neurons from firing (until the packet reaches threshold again). Now this is where we see a departure from these dynamics due to the propagation delay ∆ in Eq. (S1). The top neuron indeed fires a spike and resets itself when it reaches threshold (through the self-reset −τ o i (t) term), but the spike takes a time ∆ to decrement the other membrane potentials (through the −τ j =i o j (t − ∆) term). Thus the other neurons in the packet continue to travel toward threshold while they are waiting to receive the top neuron's inhibitory spike, and they may themselves cross threshold and fire spurious spikes. We will see, just as we saw in the soft-threshold model, that spurious spikes increase the readout error σ readout , and thus one of our goals in this section will be to calculate the number of spurious spikes and the effect they have on the readout. Now as we continue to describe the dynamics with delay ∆ > 0 and eventually calculate quantities like the mean number of spurious spikes, we will see as in the previous section, that defining the notion of an "event" is helpful. Thus we define an event here as the interval of time from when a top neuron in the membrane potential packet reaches threshold, at a time we refer to as t thresh,1 , to the next time a top neuron reaches threshold again (this latter top neuron need not be the same as the previous top neuron), at a time we refer to as time t thresh,2 . Importantly, we define neurons that spike within the time interval t thresh,1 to t thresh,1 + ∆ as spuriously spiking neurons, and do not consider these neurons as candidates for marking the end of the event at t thresh,2 . Instead, the neuron that next spikes after t thresh,1 + ∆ is denoted as the next top neuron to reach threshold, marking the end of the event when it reaches threshold at time t thresh,2 . To understand why this notion of event is welldefined, recall that we are focusing on the limit of small noise and small delays. In particular, note that the inhibition from one or more spikes (the first spike from the top neuron at t thresh,1 and any additional spurious spikes that may occur) decrements the membrane potentials by 1 or more, and the membrane potentials are driven toward threshold by the current N . Thus it takes a mean time of O(τ /N ) for the next spike to occur after t thresh,1 +∆ (the time when the entire population is first inhibited due to delayed inhibition from the first spike from the top neuron), with deviations from this mean time being small because we are considering the limit of small noise. And since we are considering delays δ τ , which is equivalent to ∆ τ /N , we see that the time interval during which spurious spikes can occur is much smaller than the O(τ /N ) time for the next nonspurious spike to occur. Thus there is a clear separation between the spike times of spurious spikes (between t thresh,1 and t thresh,1 + ∆) and the spike of the next top neuron t thresh,2 = t thresh,1 + O(τ /N ).
Importantly, examining the interval from t thresh,1 to t thresh,1 + O(∆) more closely, we see another departure from the zero-delay dynamics described in the previous section when we consider that in any particular event, the top neuron is decremented at time t thresh,1 , while the other N − 1 neurons are decremented at time t thresh,1 + ∆. (And if spurious spikes occur in the event, further differences in inhibition times in the population are of O(∆).) Interestingly, this creates a "head start" toward threshold for the top neuron in the packet, relative to the other neurons, as the membrane potentials travel toward threshold until the next top neuron spikes at time t thresh,2 . To see this, recall that the top neuron's membrane potential is reset to −1/2 when it self-resets at time t thresh,1 . At this point in time, it is receiving an extra small, excitatory current from the leak term −λ V V i because V α (where α is the index of the top neuron) is negativethis is in addition to the input current N . Meanwhile, the other membrane potentials are still traveling toward threshold, and the membrane potentials near the top of the packet are near the threshold at 1/2. Thus these neurons experience a small, additional inhibitory current from the leak term −λ V V i (t), slightly slowing their travel toward threshold. So we see that after all membrane potentials have received the inhibitory spike after delay ∆, the top neuron has integrated extra current relative to the other neurons near the top of the packet, and so it has a head start on its way toward threshold. This corresponds to a stretching of the upper tail of the membrane potential packet from the nominal Gaussian distribution of width σ OU . We will return to this subtle but important departure from a Gaussian packet when the nature of the packet's tail becomes important in our calculations.

A.4.1 Mean number of spurious spikes
Equipped with our understanding of the dynamics Eq. (S1) and the definition of an event, let us now turn to quantifying the number of spurious spikes in a given event, as spurious spikes increase the readout error and thus must be included in our study of σ readout . With the objective of quantifying the number of spurious spikes in an event, consider an event that starts at time t thresh,1 and ends at time t thresh,2 . At time t thresh,1 , the top neuron in the membrane potential packet has just reached threshold, and the rest of the neurons in the packet are below threshold. At time t thresh,1 , we can estimate the typical position of the mean of the membrane potential packet, which we denote asV , by computing what the value of the mean must be so that the tail of the membrane potential distribution that is above threshold is 1/N . When this tail probability is above 1/N it is likely that out of the N neurons, the top neuron has just crossed threshold. The probability density of the membrane potential packet is approximately Gaussian with width σ OU . Thus the approximate position of the meanV of the membrane potential packet relative to the threshold, i.e. θ := 1/2 −V , is then estimated via the condition or equivalently, Next, we integrate the passage of the Gaussian packet across the threshold during the delay ∆ to estimate the mean number of spurious spikes, which we define as λ: To understand this expression, note that between the time t thresh,1 to time t thresh,1 + ∆, the mean of the membrane potential distribution is shifting up at a rate N/τ . So over this duration of length ∆ = δ/N the mean shifts up by δ/τ . Thus the tail above threshold also shifts up. This integral reflects an integral over the additional probability mass of the membrane potential distribution that is above threshold between the time of the first spike t thresh,1 and the time t thresh,1 + ∆ of the first inhibition to all neurons. Multiplying this by N then yields an estimate for the mean number of spurious spikes λ. This simple approximation of Eq. (S83) to Eq. (S85) for the mean number of spurious spikes λ yields a good agreement with numerical simulations ( Figure C) when the distribution of membrane potentials is actually drawn from a Gaussian. We note though that the approximation error consistently corresponds to a slight overestimate at finite N ; this appears to be a finite size effect as this error diminishes as N becomes large, as demonstrated in ( Figure C). In these simulations, we sample the number of spurious spikes 2 × 10 5 times, using the following procedure for each sample. First, we take N draws from a Gaussian of width σ OU = 1 and mean zero, with each draw representing the value of a membrane potential relative to the mean membrane potentialV . Second, we take the maximum of these draws to represent the top neuron's membrane potential; we call this θ. Third, we count how many membrane potentials lie between θ and θ − δ (we choose δ = τ × 10 −2 for this figure), and record the count as the number of spurious spikes. Finally, we average over the 2 × 10 5 samples obtained using this procedure to compute the mean number of spurious spikes λ (blue dots). We also compute λ using Eq. (S83) to Eq. (S85) with the same parameters, σ OU = 1 and δ = τ × 10 −2 (orange curve).
Now importantly, we recall the detail that the O(∆) time-difference between the top-neuron's self-reset and the inhibition of the rest of the population slightly stretches the upper tail of the membrane potential packet due to the leak term −λ V V i (t) through the head start effect. Thus the actual tail of the membrane potential packet is slightly thinner than the Gaussian tail we used in our calculation for λ. However, we note that a thinner tail would facilitate even fewer spurious spikes, because the membrane potentials are more spread out. Thus our calculation for λ is a further overestimate, when this effect is taken into account. Also importantly, we have neglected variations due to the noise term √ τ ση i (t) in the membrane potentials that may cause extra neurons to cross threshold and spuriously spike-but on the timescale of the delay ∆, these diffusive fluctuations have a standard deviation of O(σ √ ∆) = O(σ δ/N ), which is much smaller than the mean membrane potential increase due to the driving current during the delay, which is O(∆N ) = O(δ). Thus, these diffusive fluctuations can be safely neglected in computing the mean number of spurious spikes.
Importantly, the mean number of spurious spikes λ is small for high-performing networks. While we have computed an estimate of the mean here, we will further make the approximation that the distribution of the number of spurious spikes is Poisson. In the limit of small λ, where large numbers of spurious spikes are extremely rare, the detailed distribution of spurious spikes is not expected to have a large impact on the readout error, and so a Poisson approximation with the same mean should suffice to approximately calculate the readout error. We will confirm this expectation below.

A.4.2 Mean readout error
With this knowledge of the distribution of spurious spikes in an event, let us compute σ readout by generalizing the calculation in the previous section. In the previous section, we focused on the case where λ V 1 for ease of analysis, and found empirically that this case provided an upper-bound for σ readout for the case of larger λ V . Thus here, we will again start by analyzing the λ V 1 case, with the expectation that this corresponds to an upper-bound. Now, the primary difference between the dynamics of the network in the previous section with ∆ = 0, and this section with ∆ > 0, is that we have 1 + l-spike events, where l ≥ 0 as opposed to only 1-spike events. So to generalize our calculations for the variation of the readout at a particular time in the previous section, let us define here the value of the readout at a time t mid which is lτ /N +τ /2N after the start of an event, to be ξ :=x(t mid ). Since we are considering the case of high network performance, i.e. λ 1, by far the most prevalent type of event is 1-spike events, followed by 2-spike events. Thus our calculation for the standard deviation σ ξ of the random variable ξ in the previous section carries through to zeroth order in λ. But remarkably, one can also repeat the calculation for σ ξ by imagining a readout consisting of only 2-spike events, and one recovers the same result, σ ξ = σ/ √ 2N . (Intuitively, the sums of spike-time variation k n=1 γ n in the calculation now contain half as many terms, but each γ n has twice the variance because 2-spike events have twice the duration of 1-spike events and integrate twice as much noise √ τ ση i (t).) Thus, aside from additional variance from the random interspersing of 2-spike events among 1-spike events, σ ξ is still σ/ √ 2N . Notably, we have seen in the soft-threshold model that such an additional variance is higher-order in 1/N . However, in the LIF model, we also have the possibility that the membrane potential packet retains a leaky memory of its history, thus 2-spike events may tend to occur in runs of some typical length χ (a correlation length) as opposed to the single, randomly interspersed 2-spike events that we considered in the soft-threshold model. In the LIF model, a 2-spike event is more likely to follow a preceding 2-spike event than a preceding 1-spike event, because in the preceding 2-spike event, a second-to-top neuron is already close to threshold, ready to create a spurious spike with relatively high probability. To understand the magnitude of the additional variance contributed to σ 2 ξ due to 2-spike event runs of length χ, one can repeat the calculation for Eq. (S45) with 2-spike event run replacements instead of individual 2-spike event replacements, and one finds that the additional variance to ξ scales as χ/N 3 . So as long as χ scales < O(N ), we can still safely neglect this additional variance contribution, and our result of σ ξ = σ/ √ 2N still holds to leading order in N .
Equipped with the variance σ 2 ξ of the readout at time t mid , the next step toward computing σ readout is computing the integrated squared deviation s i for an event, because σ readout is simply a sum of the s i (Eq. (S20)). We can categorize events by spike-number just like we did in the soft-threshold model, so that we can use the simplified expression for σ readout , Eq. (S30), that expresses the sum over all s i as a weighted sum of expectation values of s i (l), where s i (l) is defined as s i conditioned on the event containing l + 1-spikes. To compute an instance of s i (l) then, we can take our expression for s i in the previous section (Eq. (S75)) and modify it to include the fact that the event s i (l) contains l spurious spikes. We obtain the equation below, and explain the modifications we made in turn.
First, we have introduced the random variable t i f p (l), which is the duration of an l + 1 spike event. Since the membrane potentials of an l + 1-spike event must recover to threshold from a decrement of l + 1, the moments of t i f p (l) are given by the first-passage time distribution of a particle undergoing Brownian motion (this is using our assumption of small λ V ) with mean (l + 1)τ /N and standard deviation that is O(1/N 3/2 ), which we can neglect from our calculations as we did in the previous section. Second, we have added the term l N to denote the increment in the readout near the beginning of the event due to the spurious spikes that occur within a time ∆ from the beginning of the event. Note that we have collapsed the spike-times of the l spurious spikes over a time interval of size ∆, to a point time-interval at the beginning of the event. To understand why this approximation accounts for the leading-order effect of delays, recall that the time ∆ τ /N , and thus the contribution to the integral in Eq. (S86) from the O(∆) spike-time variability of the l spurious spikes is much less than the contribution of the l N term, which is integrated over the entire O(τ /N ) duration of the event. Thus, collapsing the spike-times serves to capture the leading-order effect of the delay ∆-that of the l N term. Now, to evaluate Eq. (S86), we still need to compute the mean readout x(t) t . But instead of going through another calculation to evaluate x(t) t , we note that we are already describing an upper-bound for σ readout because the result from the previous section (Eq. (S82)) is an upper-bound for general λ V , and our estimate for the number of spurious spikes λ (Eq. (S85)) is also an overestimate. Thus here too, we have the freedom to make an overestimate in our calculation of s i (l), and the usage of such an s i (l) will ultimately result in an upper-bound for σ readout . We take advantage of this freedom to avoid the additional complexity of calculating x(t) t , and instead substitute the result x(t) t = E(ξ) from the case of zero delays (Eq. (S78)) into Eq. (S86). Clearly, x(t) t may deviate from this result because in the case of delay ∆ > 0, there are 1 + l-spike events to consider, and not just 1-spike events. But crucially, the actual value of x(t) t minimizes the mean squared deviation σ readout because of a general property of computations for mean-squared-error (Eq. (S17)). Namely, calculating deviations relative to the mean of a random variable minimizes mean-squared error. Thus any value we substitute for x(t) t can only yield the exact value or overestimate for the expectation of s i (l), s i (l) i , which is the quantity we will use to compute σ readout in Eq. (S30). Therefore, let us implement this substitution, keeping in mind that this provides an overestimate. Eq. (S86) becomes where we have cancelled E(ξ) in the second line and introduced the unit Gaussian random variable y i to express the variation in ξ i . Taking the expectation over the s i (l), we then obtain Then, to compute σ readout , we can substitute Eq. (S89) into Eq. (S30): where p(l) is a Poisson distribution as defined in Eq. (S24), but with λ from Eq. (S85), and we recall that the denominator represents the mean duration of an event. Now, recalling that we made several overestimates during our derivation-the small λ V limit corresponds to an upper-bound for σ readout for general λ V , our estimate for λ was an overestimate (Eq. (S85)), and our choice to compute deviations relative to the E(ξ) instead of the mean readout was an overestimate-and that we made the central limit theorem approximation in the computation of σ ξ , this expression for σ readout is in fact an approximate upper-bound. So we write where we have also dropped high-order terms in λ because we are interested in networks with high performance (λ 1). Importantly, Eq. (S91) can be evaluated numerically (numerically evaluating Eq. (S85)) and compared against simulation (Figure 4). Furthermore, we can numerically compute the minimal readout standard deviation, σ * readout , and the associated optimal noise level, σ * (Figure 4). Finally, we numerically differentiate σ * readout and σ * with respect to the delay δ and observe that σ * readout and σ * grow as σ * readout ∼ (δ/τ ) 2/3 and (S94) for δ τ , which matches the behavior of the soft-threshold model, Eq. (S39) and Eq. (S38) ( Figure D). Figure D: Numerically minimizing Eq. (S91) with respect to σ, we find that σ * readout and σ * (top/bottom, solid blue curve) grow with small delay δ τ as ∼ (δ/τ ) 2/3 and ∼ (δ/τ ) 1/3 , respectively (top/bottom, dashed orange line). Note that this limiting behavior matches that of the soft-threshold model. Noise enters the soft-threshold model as the standard deviation of the first-spike time, 1 N ρ . And in the soft-threshold model, λ * ∼ (δ/τ ) 2/3 implies that 1 ρ * = δ λ * ∼ (δ/τ ) 1/3 , which matches the σ * ∼ (δ/τ ) 1/3 that we have here for the LIF model.

B Simulation details
We use τ = 1 in all simulations.

B.1 Soft-threshold model
In Figure 2b, our simulations take advantage of the small delay ∆ (∆ = δ/N , where δ τ ) and large N limit of the soft-threshold model to efficiently compute spike-times t k in Eq. (S6) and empirically estimate σ readout . We first consider when all the neurons cross threshold together, and the neurons spike with total probability rate N ρ. The first-spike time is an exponential random variable with mean 1/N ρ, and so in our simulations, we simply draw the firstspike time from this exponential distribution. Next, we consider the time after the first spike, during the delay ∆. The other N − 1 ≈ N neurons continue to spike probabilistically, with each neuron's next-spike time exponentially distributed with mean 1/ρ, in the absence of inhibition. Here we draw N next-spike times from this exponential distribution. Of course, inhibition arrives after a time ∆, and so the only neurons that spike are the ones with next-spike time less than ∆; we take these as the spuriously spiking neurons, and their spike times are given by the first-spike time, plus the next-spike time for each neuron. After the inhibition arrives from the first-spike neuron, the entire population is subthreshold, and the population returns to threshold a time (1 + l)τ /N after the first spike, where l is the number of spurious spikes. At this point, we repeat our procedure, starting by drawing a new first-spike time, and so on. This yields a list of spike-times, which we insert into Eq. (S6). We use N = 32 neurons, run the simulation for 3 × 10 4 first-spikes, and sample 1.5 × 10 4 random times inx(t) to empirically calculate σ readout . We sample from within the last 2 × 10 4 spikes in the simulation, to ensure that the readout has reached the steady state.
This simulation protocol has the advantage that it avoids traditional Eulertimestep discretized time, which allows us to easily examine small delays. However, notably, this protocol does not consider the possibility that the first spike takes so long to occur that the population travels so far above threshold and the first spike is insufficient to return the population to subthreshold. Importantly, for small δ, ρ is large, and the mean first-spike time is small. Furthermore, very large first-spike times are exponentially suppressed; thus the simulation is accurate for small δ.

B.2 LIF model
In Figures 3 and 4, we perform Euler time-step simulations of Eq. (S1) with dt = 0.0001, and we use the second half of the simulation when computing σ readout .
For Figure 3a, we use N = 32, σ = 0.1, and 1562500 time-steps in each simulation (each dot). For Figure 3c, we use N = 64 and 781250 time-steps.

B.3 Simulation code
We include here our Euler time-step simulation code used for the LIF model. We have described the other, simpler computations supporting this work directly in the text. V_dot += membrane_noise/sqrt(dt) * randn(N) end